81 research outputs found

    Combining learning and constraints for genome-wide protein annotation

    Get PDF
    BackgroundThe advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale.ResultsWe present Ocelot, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as Ocelot), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning

    On Optimization Modulo Theories, MaxSMT and Sorting Networks

    Full text link
    Optimization Modulo Theories (OMT) is an extension of SMT which allows for finding models that optimize given objectives. (Partial weighted) MaxSMT --or equivalently OMT with Pseudo-Boolean objective functions, OMT+PB-- is a very-relevant strict subcase of OMT. We classify existing approaches for MaxSMT or OMT+PB in two groups: MaxSAT-based approaches exploit the efficiency of state-of-the-art MAXSAT solvers, but they are specific-purpose and not always applicable; OMT-based approaches are general-purpose, but they suffer from intrinsic inefficiencies on MaxSMT/OMT+PB problems. We identify a major source of such inefficiencies, and we address it by enhancing OMT by means of bidirectional sorting networks. We implemented this idea on top of the OptiMathSAT OMT solver. We run an extensive empirical evaluation on a variety of problems, comparing MaxSAT-based and OMT-based techniques, with and without sorting networks, implemented on top of OptiMathSAT and {\nu}Z. The results support the effectiveness of this idea, and provide interesting insights about the different approaches.Comment: 17 pages, submitted at Tacas 1

    Modelling wine astringency from its chemical composition using machine learning algorithms

    Get PDF
    Aims: The present work aims to predict sensory astringency from wine chemical composition using machine learning algorithms. Material and results: Moristel grapes from different vineblocks and at different stages of ripening were collected. Eleven different wines were produced in 75 L tanks in triplicate, and further sensory factors were described by the rate-all-that-apply method with a trained panel of participants. The polyphenolic composition was characterised in wines by measuring the concentration and activity of tannins using UHPLC-UV/VIS, the mean degree of polymerisation (mDP. and the composition of tannins using thiolysis followed by UHPLC-MS. Conventional oenological parameters were analysed using FTIR and UV-Vis. Machine learning was applied to build models for predicting a wines astringency from its chemical composition. The best model was obtained using the support vector regressor (radial kernel) algorithm presenting a root-mean-square error (RMSE) value of 0.190. Conclusions: The main variables of the astringency model were the % of procyanidins constituting tannins and ethanol content, followed by other eight variables related to tannin structure and acidity. Significance of the study: These results increase the knowledge of chemical variables related to the perception of wine astringency and provide tools to control and optimise grape and wine production stages to modulate astringency and maximise quality and the consumer appeal of wines

    Access to wine experts' long-term memory to decipher an ill-defined sensory concept: The case of green red wine

    Get PDF
    The present study aims to understand an ill-defined sensory concept by a long-term memory-based strategy with Spanish winemakers from four wine regions using "green wine" as a case study. A total of 77 Spanish winemakers from four Spanish wine regions carried out a non-tasting free description task. The description task yielded terms belonging to two main categories including origin-related terms as well as sensory terms. Sensory terms belonged to aroma, taste, trigeminal, colour, multimodal and hedonic subcategories, which elucidates the multidimensionality of the studied concept. The most cited specific terms were "vegetal aroma", "bitter"and "unpleasant". Despite these commonalities, a certain idiosyncrasy linked to taste ("excessive sourness") and trigeminal ("astringency") subcategories as well as to wine components ("tannins") was evidenced as they were cited distinctly by experts belonging to separate wine regions. The capacity of approaches based on long-term memory to decipher multidimensional and ill-defined concepts is highlighted. The regional effect is also explained in terms of cognitive processes (i.e., knowledge and experience), which is linked to the use of sensory concepts by wine experts

    Application of text mining techniques to the analysis of discourse in eWOM communications from a gender perspective

    Get PDF
    The emergence of online user-generated content has raised numerous questions about discourse gender differences as compared to face-to-face interactions. The intended gender-free equality of Internet has been challenged by numerous studies, and significant differences have been found in online communications. This paper proposes the application of text mining techniques to online gender discourse through the analysis of shared reviews in electronic word-of-mouth communities (eWOM), which is a form of user-generated content. More specifically, linguistic issues, sentiment analysis and content analysis were applied to online reviews from a gender perspective. The methodological approach includes gathering online reviews, pre-processing collected reviews and a statistical analysis of documents features to extract the differences between male and female discourses in a specific product category. Findings reveal not only the discourse differences between women and men but also their different preferences and the feasibility of predicting gender using a set of frequent key terms. These findings are interesting both for retailers so they can adapt their offer to the gender of customers, and for online recommender systems, as the proposed methodology can be used to predict the gender of users in those cases where the gender is not explicitly stated

    Analysis of landrace cultivation in Europe: A means to support in situ conservation of crop diversity

    Get PDF
    During the last century, the progressive substitution of landraces with modern, high yielding varieties, led to a dramatic reduction of in situ conserved crop diversity in Europe. Nowadays there is limited and scattered information on where landraces are cultivated. To fill this gap and lay the groundwork for a regional landrace in situ conservation strategy, information on more than 19,335 geo-referenced landrace cultivation sites were collated from 14 European countries. According to collected data, landraces of 141 herbaceous and 48 tree species are cultivated across Europe: Italy (107 species), Greece (93), Portugal (45) and Spain (44) hold the highest numbers. Common bean, onion, tomato, potato and apple are the species of main interest in the covered countries. As from collected data, about 19.8% of landrace cultivation sites are in protected areas of the Natura 2000 network. We also got evidence that 16.7% and 19.3% of conservation varieties of agricultural species and vegetables are currently cultivated, respectively. Results of the GIS analysis allowed the identification of 1261 cells (25 km × 25 km) including all the cultivation sites, distributed across all European biogeographical regions. Data of this study constitute the largest ever produced database of in situ-maintained landraces and the first attempt to create an inventory for the entire Europe. The availability of such resource will serve for better planning of actions and development of policies to protect landraces and foster their use

    Mitochondrial and nuclear markers reveal a lack of genetic structure in the entocommensal nemertean Malacobdella arrokeana in the Patagonian gulfs

    Get PDF
    Abstract Malacobdella arrokeana is an entocommensal nemertean exclusively found in the bivalve geoduck Panopea abbreviata, and it is the only representative of the genus in the southern hemisphere. To characterize its genetic diversity, population structure and recent demographic history, we conducted the first genetic survey on this species, using sequence data for the cytochrome oxidase I gene (COI), 16S rRNA (16S) and the internal transcribed spacer (ITS2). Only four different ITS2 genotypes were found in the whole sample, and the two main haplotypes identified in the mitochondrial dataset were present among all localities with a diversity ranging from 0.583 to 0.939. Nucleotide diversity was low (p = 0.001?0.002). No significant genetic structure was detected between populations, and mismatch distribution patterns and neutrality tests results are consistent with a population in expansion or under selection. Analysis of molecular variance (AMOVA) revealed that the largest level of variance observed was due to intrapopulation variation (100, 100 and 94.39 % for 16S, COI and ITS2, respectively). Fst values were also non-significant. The observed lack of population structure is likely due to high levels of genetic connectivity in combination with the lack or permeability of biogeographic barriers and episodes of habitat modification.Fil: Fernandez Alfaya, Jose Elias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Bigatti, Gregorio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Machordom, Annie. Consejo Superior de Investigaciones Cientificas. Museo Nacional de Cs. Naturales; Españ

    Transparency and Trust in Human-AI-Interaction: The Role of Model-Agnostic Explanations in Computer Vision-Based Decision Support

    Full text link
    Computer Vision, and hence Artificial Intelligence-based extraction of information from images, has increasingly received attention over the last years, for instance in medical diagnostics. While the algorithms' complexity is a reason for their increased performance, it also leads to the "black box" problem, consequently decreasing trust towards AI. In this regard, "Explainable Artificial Intelligence" (XAI) allows to open that black box and to improve the degree of AI transparency. In this paper, we first discuss the theoretical impact of explainability on trust towards AI, followed by showcasing how the usage of XAI in a health-related setting can look like. More specifically, we show how XAI can be applied to understand why Computer Vision, based on deep learning, did or did not detect a disease (malaria) on image data (thin blood smear slide images). Furthermore, we investigate, how XAI can be used to compare the detection strategy of two different deep learning models often used for Computer Vision: Convolutional Neural Network and Multi-Layer Perceptron. Our empirical results show that i) the AI sometimes used questionable or irrelevant data features of an image to detect malaria (even if correctly predicted), and ii) that there may be significant discrepancies in how different deep learning models explain the same prediction. Our theoretical discussion highlights that XAI can support trust in Computer Vision systems, and AI systems in general, especially through an increased understandability and predictability

    Defining complementary tools to the IVI. The Infrastructure Degradation Index (IDI) and the Infrastructure Histogram (HI)

    Full text link
    [EN] The Infrastructure Value Index (IVI) is quickly becoming a standard as a valuable tool to quickly assess the state of urban water infrastructure. However, its simple nature (as a single metric) can mask some valuable information and lead to erroneous conclusions. This paper introduces two complementary tools to IVI: The Infrastructure Degradation Index (IDI) and the Infrastructure Histogram (HI). The IDI is focused on time (compared to the IVI, focused on value), represents an intuitive concept and behaves in a linear way. The joint analysis of IVI and IDI provides results in a more complete understanding of the state of the assets, while maintaining the simplicity of the tools. The Infrastructure Histogram allows for a full evaluation of the infrastructure state and provides a detailed picture of network age compared to its expected life, as well as an order of magnitude of the required investments in the following years.Cabrera Rochera, E.; Estruch-Juan, ME.; Gomez Selles, E.; Del Teso-March, R. (2019). Defining complementary tools to the IVI. The Infrastructure Degradation Index (IDI) and the Infrastructure Histogram (HI). Urban Water Journal. 16(5):343-352. https://doi.org/10.1080/1573062X.2019.1669195S343352165Alegre, H., Vitorino, D., & Coelho, S. (2014). Infrastructure Value Index: A Powerful Modelling Tool for Combined Long-term Planning of Linear and Vertical Assets. Procedia Engineering, 89, 1428-1436. doi:10.1016/j.proeng.2014.11.469Amaral, R., Alegre, H., & Matos, J. S. (2016). A service-oriented approach to assessing the infrastructure value index. Water Science and Technology, 74(2), 542-548. doi:10.2166/wst.2016.250Aware-p.org. 2014. “AWARE-P/Software.” Accessed 25 November 2018. http://www.aware-p.org/np4/software/Baseform. 2018. “Baseform.” Accessed 24 November 2018. https://baseform.com/np4/productCanal de Isabel II Gestión. 2012. Normas Para Redes de Abastecimiento. [Standards for Water Supply Networks.]. https://www.canalgestion.es/es/galeria_ficheros/pie/normativa/normativa/Normas_redes_abastecimiento2012_CYIIG.pdfFrost, and Sullivan. 2011. “Western European Water and Wastewater Utilities Market.” https://store.frost.com/western-european-water-and-wastewater-utilities-market.html#section1Leitão, J. P., Coelho, S. T., Alegre, H., Cardoso, M. A., Silva, M. S., Ramalho, P., … Carriço, N. (2014). Moving urban water infrastructure asset management from science into practice. Urban Water Journal, 13(2), 133-141. doi:10.1080/1573062x.2014.939092Marchionni, V., Cabral, M., Amado, C., & Covas, D. (2016). Estimating Water Supply Infrastructure Cost Using Regression Techniques. Journal of Water Resources Planning and Management, 142(4), 04016003. doi:10.1061/(asce)wr.1943-5452.0000627Marchionni, V., Lopes, N., Mamouros, L., & Covas, D. (2014). Modelling Sewer Systems Costs with Multiple Linear Regression. Water Resources Management, 28(13), 4415-4431. doi:10.1007/s11269-014-0759-zPulido-Velazquez, M., Cabrera Marcet, E., & Garrido Colmenero, A. (2014). Economía del agua y gestión de recursos hídricos. Ingeniería del agua, 18(1), 95. doi:10.4995/ia.2014.3160Rokstad, M. M., Ugarelli, R. M., & Sægrov, S. (2015). Improving data collection strategies and infrastructure asset management tool utilisation through cost benefit considerations. Urban Water Journal, 13(7), 710-726. doi:10.1080/1573062x.2015.102469
    corecore